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PRELIMINARY AMENDMENT 

Assistant Commissioner for Patents 
Washington, D.C. 20231 

Sir: 

Prior to the calculation of fees and examination of 
the present application, please enter the amendments and 
remarks set out below. 

In the Drawings : 

Submitted herewith is a request for a proposed 
drawing modification to correct an informality in FIGs. 5-7, 
9, 10, 15 and 23 as indicated in red ink. 

In the Claims : 

Please cancel Claims 1-4. 

Please add new Claims 5-12. 
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A method of calculating the discrete cosine 



transform^ (DCT) of blocks of pixels of an image, comprising 
the steps of: 



having a fractional and scalable size N/2 i *N/2 i , where i is an 
integer; 

defining second subdivision blocks of N*N pixels as 
domain blocks, shif table by intervals of N/2 1 pixels; and 

calculating, in parallel, the DCT of 2 l range blocks 
of a domain block of N*N pixels of the image. 

6. A method according to Claim 5, wherein the step 
of calculating comprises the steps of: 

a) ordering the pixels in the range blocks of a 
certain dimension by rearranging input pixels in 2 i vectors of 
2 1 components ; 

b) calculating, in parallel, 2 L monodimensional DCTs 
by processing the vectors defined in the step a) ; 

c) arranging output sequences of the monodimensional 
DCTs relative to the 2 1 vectors; 

d) completing the calculation in parallel of 2 1 
bidimensional DCTs by processing output sequences of 
monodimensional DCTs produced in step c) ; and 

e) arranging output sequences of bidimensional DCTs 
generated in step d) in 2 1 vectors of bidimensional DCT 
coefficients . 



defining first subdivision blocks as range blocks, 



7. A method according to Claim 6, wherein the step 
of calculating 2 1 monodimensional DCTs in parallel in step b) 
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and the step of completing the parallel calculation of 2 1 
bidimensional DCTs of step d) are performed by subdividing the 
sequences resulting from step a) and from step c) , 
respectively, in groups of scalar elements, calculating the 
sums and differences thereof by way of adders and subtractors 
and by reiterately multiplying the sum and difference results 
by respective coefficients until completing the calculation of 
the relative DCT coefficients, respectively monodimensional 
and bidimensional. 



stored or /transmitted, comprising the steps of: 

defining first subdivision blocks as range blocks, 
having a fractional and scalable size N/2 i *N/2 i , where i is an 
integer; 

defining second subdivision blocks of N*N pixels as 
domain blocks, shif table by intervals of N/2 1 pixels; 

calculating, in parallel, the DCT of 2 1 range blocks 
and of a relative domain block; 

classifying the transformed range blocks according 
to their relative complexity represented by a sum of values of 
three AC coefficients; 



data of the range blocks whose complexity classification 
exceeds a pre-defined threshold and only storing a DC 
coefficient of the range blocks with a complexity lower than 
the threshold, while identifying a relative domain block to 
which the range block in a transformation belongs that 
produces a best fractal approximation of the range block; 




method of compressing data of an image to be 



applying a fractal transform in the DCT domain to 
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calculating a difference between each range block 
and its fractal approximation; 

quantizing the difference in the DCT domain by using 
a quantization table preestablished in consideration of human 
sight characteristics; 

coding the quantized difference by a process based 
on probabilities of quantization coefficients; and 

storing or transmitting code of each range block 
compressed in the DCT domain and the DC coefficient of each 
uncompressed range block. 



cosine transform (DCT) of blocks of pixels of an image, the 
apparatus comprising: 



blocks, having a fractional and scalable size N/2 i *N/2 1 , where 
i is an integer; 



pixels as domain blocks, shif table by intervals of N/2 1 pixels; 
and 

means for calculating, in parallel, the DCT of 2 1 
range blocks of a domain block of N*N pixels of the image. 

10. An apparatus according to Claim 9, wherein the 
means for calculating comprises: 

means for ordering the pixels in the range blocks of 
a certain dimension by rearranging input pixels in 2 1 vectors 
of 2 i components; 




An apparatus for calculating the discrete 



means for defining first subdivision blocks as range 



means for defining second subdivision blocks of N*N 
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means for calculating, in parallel, 2 1 
monodimensional DCTs by processing the vectors defined by the 
means for calculating; 

means for arranging output sequences of the 
monodimensional DCTs relative to the 2 1 vectors; 

means for completing the calculation in parallel of 
2 1 bidimensional DCTs by processing output sequences of 
monodimensional DCTs produced by the means for arranging output 
sequences of the monodimensional DCTs; and 

means for arranging output sequences of 
bidimensional DCTs, generated by the means for completing the 
calculation, in 2 i vectors of bidimensional DCT coefficients. 



11. An apparatus according to Claim 10, wherein the 
means for calculating 2 1 monodimensional DCTs in parallel in 
and the means for completing the parallel calculation of 2 1 
bidimensional DCTs are for subdividing the sequences resulting 
from the means for ordering and the means for arranging output 
sequences of the monodimensional DCTs, respectively, in groups 
of scalar elements, calculating the sums and differences 
thereof by way of adders and subtractors and by reiterately 
multiplying the sum and difference results by respective 
coefficients until completing the calculation of the relative 
DCT coefficients, respectively monodimensional and 
bidimensional . 

An apparatus for compressing data of an image 
to be sto/ed or transmitted, comprising: 
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means for defining first subdivision blocks as range 
blocks, having a fractional and scalable size N/2 i *N/2 i , where 
i is an integer; 

means for defining second subdivision blocks of N*N 
pixels as domain blocks, shiftable by intervals of N/2 1 pixels; 

means for calculating, in parallel, the DCT of 2 1 
range blocks and of a relative domain block; 

means for classifying the transformed range blocks 
according to their relative complexity represented by a sum of 
values of three AC coefficients; 

means for applying a fractal transform in the DCT 
domain to data of the range blocks whose complexity 
classification exceeds a pre-defined threshold and only 
storing a DC coefficient of the range blocks with a complexity 
lower than the threshold, while identifying a relative domain 
block to which the range block in a transformation belongs 
that produces a best fractal approximation of the range block; 

means for calculating a difference between each 
range block and its fractal approximation; 

means for quantizing the difference in the DCT 
domain by using a quantization table preestablished in 
consideration of human sight characteristics; 

means for coding the quantized difference by a 
process based on probabilities of quantization coefficients; 
and 

means for storing or transmitting code of each range 
block compressed in the DCT domain and the DC coefficient of 
each uncompressed range block. 



6 



In re Patent Application of 
PAU ET AL . 

Serial No. Not Yet Assigned 
Filed: Herewith 



It is believed that all of the claims are patentable 
over the prior art. Accordingly, after the Examiner completes 
a thorough examination and finds the claims patentable, a 
Notice of Allowance is respectfully requested in due course. 
Should the Examiner determine any minor informalities that 
need to be addressed, he is encouraged to contact the 
undersigned attorney at the telephone number below. 
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METHOD AND SCALABLE ARCHITECTURE FOR PARALLEL 
CALCULATION OF THE DCT OF BLOCKS OF PIXELS OF DIFFERENT 
SIZES AND COMPRESSION THROUGH FRACTAL CODING 



5 Field of the Invention 

The invention relates in general to digital 
processing systems for recording and/or transmitting 
pictures, and more in particular to systems for 
compressing and coding pictures by calculating the 
10 discrete cosine transform (DCT) of blocks of pixels of a 
picture. The invention is particularly useful in video 
coders according to the MPEG2 standard though it is 
applicable also to other systems. 

15 Background of the Invention 

The calculation of the discrete cosine transform 
(DCT) of a pixel matrix of a picture is a fundamental 
step in processing picture data. A division by a 
quantization matrix is performed on the results of the 

20 discrete cosine transform for reducing the amplitude of 
the DCT coefficients, as a precondition to data 
compression which occurs during a coding phase, 
according to a certain transfer protocol of video data 
to be transmitted or stored. Typically, the calculation 

25 of the discrete cosine transform is carried out on 

blocks or matrices of pixels, in which a whole picture 
is subdivided for processing purposes. 



Increasing speed requisites of picture processing 
systems for storage and/or transmission, imposes the use 
of hardware architectures to speed up various processing 
steps among which, primarily, the step of discrete 
cosine transform calculation by blocks of pixels. Use 
of hardware processing imposes a pre -definition of few 
fundamental parameters, namely the dimensions of the 
blocks of pixels into which a picture is subdivided to 
meet processing requisites. 

Such a pre -definition may represent a heavy 
constraint that limits the possibility of optimizing the 
processing system, for example a MPEG2 coder, or its 
adaptability to different conditions of use in terms of 
different performance requisites. It is also evident 
the enormous economic advantage in terms of reduction of 
costs of an integrated data processing system that may 
be programmed to calculate in parallel the DCT on 
several blocks of pixels of size selectable among a 
certain number of available sizes. 

Summary of the Invention 

It is evident that a need exists for a method and of 
a hardware architecture for calculating the discrete 
cosine transform (DCT) on a plurality of blocks of 
pixels, in parallel, which provides for the scalability 
of the size of the blocks of pixels. For example, the 
calculation of the discrete cosine transform (DCT) 
either for one block of 8x8 pixels, or four blocks of 
4x4 pixels in parallel, or for sixteen blocks of 2x2 
pixels in parallel, operating a selection of the block's 
size. 

Scalability of the size of the block of pixels and 
the possibility of performing the calculation of the 
discrete cosine transform in parallel on blocks of 
congruently reduced size compared to a certain maximum 
block's size, by a hardware structure is also 
instrumental of the implementation of highly efficient 
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"hybrid" picture compression schemes and algorithms. 
For example, by virtue of the scalability of the block 
size and of the ability to calculate in parallel the DCT 
on more blocks, it is possible, according to the present 
invention, to implement a fractal coding applied in the 
DCT domain rather than in the space domain of picture 
data, as customary. 

Therefore, another important aspect of the invention 
is a new picture data compressing and coding method that 
practically is made possible by a hardware structure 
calculating the DCT on blocks of scaleable size and 
which includes 

subdividing a picture by defining two 
distinct types of subdivision blocks: a first type, 
of N/i*N/i dimension called range blocks that are not 
overimposable one on another, and a second type, of 
N*N dimension, called domain blocks, that are 
transferable by intervals of N/i pixels and 
•overimposable one on another (by transferring on the 
original picture a window that identifies a domain 
block by an interval equivalent to the horizontal 
and/or vertical dimension of a range block) ; 

calculating the discrete cosine transform (DCT) of 
the 2 i range blocks and of a relative domain block in 
parallel ; 

classifying the transform range blocks' according 
to their relative complexity calculated by summing 
the three AC coefficients; 

applying the fractal transform in the DCT domain 
to the data of range blocks whose complexity exceeds a 
pre-defined threshold and storing only the DC 
coefficient of the range blocks with a complexity 
lower than said threshold, identifying a relative 
domain block belonging to the range block being 
transformed that produces the best fractal 
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appropriate linear transform, for example, rotations, ^, 
overturns, r, or the like and a domain block DCT, of 
which being defined by F D (u,v) , which at least 
approximately satisfy the following equation: 
5 F R (u,v)=t(F D (u,v)) 

F r (u,v)=t(F d (u,v)) . (1) 

Having so identified the domain block most similar 
or homologous to the range block that is being 

processed, its parameters F D (u,v) t T,(j> are stored. The 
10 difference picture between the range block and its 
fractal position is then calculated: 

D(u,v) = F R (u,v)-<fi(F D (u,v)) 

D(u,v)=F r (u,v)-t(F d (u,v)) (2) 
By quantizing the difference picture D(u,v), 
15 D Q (u, v) = INTEG[D(u, v)/ Q(u t v)] (3) 

is .obtained, where: 

DQ(u,v) is the quantized difference picture in the 

domain of DCT; 

Q(u,v) is a quantization table designed by 
20 considering human sight characteristics; 

INTEG is a function that approximates its argument 
to the nearest integer; 

After quantization, the majority of the D 0 (u,v) 
coefficients are null. Therefore, it is easy to design a 
25 coding, for example Huffman coding, based on the 

probabilities of the coefficients. Finally, the code to 
be recorded or transmitted is stored. The compression 
procedure terminates when each range block has been 
coded . 



Brief Description of the Drawings 

The different aspects and implementations of the 
scaleable architecture for calculating the discrete 
cosine transform of the invention as well as of the 
5 method of compression and fractal coding, will be more 
easily understood through the following detailed 
description of an embodiment of the architecture of the 
invention and of the different functioning modes 
according to a selection of the size selection of the 
10 blocks of pixels into which the picture is divided by- 
referring to the annexed drawings, wherein: 

Figure 1 is a block diagram of a coder effecting 
hybrid compression based on fractal coding and DCT, 
according to the present invention; 
15 Figure 2 is a flow chart of the parallel computation 

of the DCT of sixteen blocks of 2x2 -pixel size; 

Figure 3 illustrates the architecture for parallel 
computation of sixteen 2x2 DCTs; 

'Figure 4 shows the arrangement of the input data for 
2 0 calculating the sixteen 2x2 DCTs; 

Figure 5 shows the PROCESS phase of the calculating 
procedure of sixteen 2x2 DCTs; 

Figure 6 illustrates the architecture for parallel 
computation of four 4x4 DCTs; 
25 Figure 7 shows the arrangement of the input data for 

calculating the four 4x4 DCTs; 

Figure 8 shows the PROCESS phase of the calculating 
procedure of four 4x4 DCTs; 

Figure 9 illustrates the architecture for parallel 
30 computation of an 8x8 DCT; 

Figure 10 shows the arrangement of the input data for 
calculating an 8x8 DCT; 

Figure 11 shows the PROCESS phase for the calculating 
procedure of an 8x8 DCT; 
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Figure 12 shows the scaleable hardware architecture 
of the invention for calculating an 8x8 DCT or four 4x4 
DCTs in parallel or sixteen 2x2 DCTs in parallel; 

Figure 13 shows the INPUT structure of the scaleable 
5 architecture of the invention; 

Figure 14 shows the PROCESS structure for the 
scaleable architecture of the invention; 

Figure 15 shows the functional schemes of the blocks 
that implement the PROCESS phase in the scaleable 
10 architecture of the invention; 



Figure 


16 


is 


a 


detailed 


scheme 


of 


the QA block; 


Figure 


17 


is 


a 


detailed 


scheme 


of 


the QB block; 


Figure 


18 


is 


a 


detailed 


scheme 


of 


the QC block; 


Figure 


19 


is 


a 


detailed 


scheme 


of 


the QD block; 


Figure 


20 


is 


a 


detailed 


scheme 


of 


the QE Tslock; 


Figure 


21 


is 


a 


detailed 


scheme 


of 


the QF block; 


Figure 


22 


is 


a 


detailed 


scheme 


of 


the QG block; 


Figure 23 


illustrates an implementation of the ORDER 


phase in the 


scaleable architecture of the invention; 



2 0 and 

Figure 24 illustrates an implementation of the OUTPUT 
phase in the scaleable architecture of the invention. 

Detailed Description of the Preferred Embodiments 

25 Though referring in some of the schemes illustrated 

in the figures to a particularly significant and 
effective implementation of the architecture of parallel 
computation of the discrete cosine transform (DCT) on 
blocks of pixels of scaleable size, which comprises a 

3 0 compression phase for the fractal coding of the picture 

data, it is understood that the method and architecture 
of parallel •calculation of the discrete cosine 
transformed (DCT) of a bidimensional matrix of input 
data by blocks of a scaleable size, provide for an 



exceptional freedom in implementing particularly- 
effective compression algorithms by exploiting the 
scalability and the possibility of a parallel 
calculation of DCT. 

The partitioning steps and the calculation of the 
discrete cosine transform of a bidimensional matrix of 
input data will be described separately for each size of 
range block, according to an embodiment of the 
invention, starting from the smallest block dimension of 
2x2 for which the DCT calculation is performed in 
parallel, up to the maximum block dimension of 8x8. 
This description of an architecture scaleable according 
to needs by changing the value of the global variable 
size, will follow. 

The procedure for a parallel DCT computation of the 
invention may be divided in distinct phases: 

INPUT phase 

PROCESS phase 

ORDER phase 

OUTPUT phase. 
Each phase is hereinbelow described for each case 
considered. 

The DCT operation may be defined as follows. 
For an input data matrix Xn*n~ 

[x fJ }0<i,j<N-l, the 
output matrix y N * N =\y mn \o <m,n < N -I , is defined by: 

4^M«)||^ W co(^,)co{M^,) <4> 

where : 

*(»)=( 

\\per\<n<N -1 
For convenience, assume that N=2* , where i is an 
integer and i>l. Let's remove e{m) , s(n) , and the 
normalization value 2/N from equation (4) , in view of 



the fact that they may be reintroduced in a successive 
step. Therefore, from now on, the following simplified 
version of equation (4) will be used: 



Parallel computation of sixteen 2x2 DCTs 

For N=2 equation (5) becomes: 

The flow graph for a 2x2 DCT is shown in Fig. 2, in 
which A=B=C=1 and the input and output data are the 
pixels in the positions (0,0) , (0,1) , (1,0) , (1,1) . 

Let us now consider how to calculate in parallel 
sixteen 2x2 DCTs in which an 8x8 block is subdivided. 
The procedure is divided in many steps, a global view of 
which is depicted in Fig. 3. This figure highlights the 
transformations performed on the 2x2 block constituted 
by the pixels (0,6) , (0,7) , (1,6) , (1,7) . 

The pixels that constitute the input block are 
ordered in the INPUT phase and are processed in the 
PROCESS phase to obtain the coefficients of the sixteen 
bidimensional DCTs, or briefly 2-D DCTs, on four samples, 
for example, the 2-D DCT of the block (0,1) constituted 
by: 

{1 [0] ,m[0] ,n[0] ,o[0] } is {a [0] ,b[0] ,c [0] ,d[0] } 
The coefficients of the 2-D DCT are re -arranged in 
the ORDER phase into eight vectors of eight components . 
For example the coefficients {a [0] ,b [0] , c [0] , d [0] } 
constitute the vector 1 ' . The vectors thus obtained 
proceed to the OUTPUT phase to give the coefficients of 
the 2x2 DCT, constituting the output block. 
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INPUT phase 

The pixels of each block , with 0< i < 1 and 0< j 

<3, are ordered to the eight -component vectors 1, m, n, 
o in the following manner: 

the pixels that occupy the position (0,0) in the 
block constitute the vector 1; 

the pixels that occupy the position (0,1) in the 
block constitute the vector m; 

the pixels that occupy the position (1,0) in the 
block constitute the vector n; 

the pixels that occupy the position (1,1) in the 
block constitute the vector o. 

Similarly, the pixels of each block (i,j), with 2 
< i £ 3 and 0 < i < 3, are ordered to constitute the 
eight -component vectors p, q, r, s in the following 
manner : 

the pixels that occupy the position (0,0) in the 
block constitute the vector p; 

the pixels that occupy the position (0,1) in the 
block constitute the vector g; 

the pixels that occupy the position (1,0) in the 
block constitute the vector r; 

the pixels that occupy the position (l',0) in the 
block constitute the vector s. 

This arrangement is detailed in Fig. 4. It should be 
noted, for example, that the pixels of the block (0,3) 
will constitute the third component of the 1, m, n, o 
vectors . 
PROCESS phase 

The PROCESS phase includes calculating in parallel 
the sixteen 2-D DCTs by processing the eight -component 
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vectors 1, m, a as shown in Fig. 5. It is noted for 

example, that the coefficients of the 2-D DCT of the 
block (0,3) will constitute the third component of the 
vectors a, b, c, d of Fig. 3. 
5 ORDER phase 

The ORDER phase includes arranging the output 
sequences of the eight 2-D DCTs in eight vectors 2', m' , 
. . . , a' thus defined: 



~a[0] 




-a[2] 




a[4] 


h\n\ 

6[0J 




b[2\ 






c[0] 




c\2] 




<M 


d[0] 




d[2] 




44] 


= 4] 


, m'= 


a[3) 




,[5] ' 










*] 


c[\] 




c\3] 




c[5] 


4\] 




43l 




45].' 


-a[6j 








r*r 


tM 
6|6| 




/IPJ 




/[2] 


c[6] 




«[0] 




«M 


d[6] 




k[0] 




h[l] 


= 4i] 


P' = 


e[l] 




*] 


*[7] 




/[I] 




/[3] 


c[7] 








*M 


.47]. 








>[3] 


|-e[4]' 




■*r 




/W 




/m 




«M 




«m 




A[4] 




A[6] 




= «M 


i s' = 


,[7] 








M 




«M 








."[5] 




m 





It is noted, for example, that the coefficients of 
the 2-D DCT of the block (0,3) will constitute the 



12 



components 4, 5, 6, 7 of the vector m' . 



OUTPUT Phaafi 

This phase includes rearranging the output data: 
starting from the eight -component vectors a, Jb, h, 
a 64 component 
constructed: 



>[o] y{i] 
y[54] y[55] 



y[63[ 



?r defined as 


follows , 


is 








'[1] 


/[4] 


43 


m[0] 


«M 


m[4] 


m[5jl 


l[2] 


'[3] 


l[6] 


m 


42] 


m [3] 


m[6] 


47] 


n[0] 


n[l] 


«M 


n[5] 


<*] 


«*] 


o[4] 


o[5] 


n[2] 


„[3] 


46] 


47] 


o[2] 


o[3] 


o[6] 


o[l\ 


do] 


J*] 


M 


/*] 


*] 


til 


«M 


#] 




J#l 


P[6] 


Pt7] 


*] 


*] 


*] 


«M 


r[0] 


<•[!] 


r[4] 


'[5] 


*] 




s [4] 


*] 


M 


,[3] 


r[6] 


r[7] 


*] 


*] 


*] 


*]. 


of 


four 


4x4 


DCT. 











For N = 4, equation (5) becomes 



■ y m ,n 

If: 



3 3 

= X2Xv cos i 

/=0 y=0 









"/o" 








fx 

A 
.A. 



(7) 



K»' ? =0 = fa,0 » > *2,2 » *3,3 } 
t*3,/ f i=0 = Ko > *3,1 > *0,2 > *2,3 } 
{ B 3,i f i=zQ = {*2,0 > *0,1 , *3,2 , *1,3 
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it may be demonstrated that : 

Y\6*\ =( £ 4)l6^i6»I 



(9) 



to), 

to), 
to). 



to), 
-to), 

-(",), 



(Ho), 
-(",), 
-(H 2 ), 

to), 



to), 
-to), 
to), 
-to). 



(10) 



The matrices iH L ) 4I i =0, 1, 2, 3 are as follows: 



(tfo) 4 : 



0 
0 
0 

I 

0 

1_ 

2 
0 

2j 



("l) 4 



,te) 4 = 



0 

l_ 

2 
0 

_I 
2 



The monodimensional DCT, or briefly the l-D DCT, is 
expressed by the matrix (l-D DCT) 4 given by: 



C 4 = 



-c\ 



^1 



where C™ = cos — n 



From these equations it may be said that the 
computation of one 4x4 DCT may be divided into two 
steps : 

computation of four l-D DCT, each performed on an 
appropriate sequence of four pixels. 

computation of the 2-D DCT starting from the four 
l-D DCT. 

These two steps are carried out in a similar manner, 
and are implemented with the same hardware that is used 
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x 0,0 




*1,0 




x 2,0 




x 3,0 


x 2,2 


Ahj _ 
' A 3 ~ 


x 3,l 




x'> J 
x 0,l 




x 2,l 




X 0,2 




x 3,2 


x \,2 


x i,j 
x 3,3 . 




x iJ 
.2.3. 




x'> j 




x*'j 
_ °.3 _ 



twice. Let us consider now how to calculate in parallel 
four 4x4 DCTs. The total 64 samples are obtained from 
the 4x4 blocks into which an 8x8 block is subdivided. 
The procedure is subdivided in distinct phases to each 
of which corresponds an architectural block. A whole 
view is shown in Fig. 6. This figure highlights the 
transformations carried out on each 4x4 block. 
INPUT phase 

The pixel of each quadrant , 0 < i,j<l are 

ordered to constitute the vectors: 



4> J - 



After arranging the data in 16 four -component 
vectors, we define the eight -component vectors 1, m, n, 
o, constituted by the first, second, third and fourth 
components, respectively, of the initial vectors 
constituted by the pixels of the 00 and 01 quadrants, 
and the p, q, r, a, vectors constituted by the first, 
second, third and fourth components, respectively, of 
the initial vectors constituted by the pixels of the 10 
and 11 quadrants. Precisely: 

A[of-° 

A 3 [0f-° 
B 3 [of-° 
B t [Of 0 

AW 
4[of 

* 3 [or 



4tr" 




A&r 

A,[2f- 0 
B 3 l2f° 
Bi[2f-° 












4[3f° 


s.[if° 






B 3 [if> 


1 n = 




B ^r 








O = 


A^r 


A 2 [ir 




4[2p-> 




A&r 












Mr. 




B,[2f 




J>,[3]W 
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~ A,V>V>~ 




a hi 1 . 0 


A 3 [if fi 




A^lf" 




4[3] 1,0 


m lfi 




B 2 [2f 0 




EM* 


m lfi 








5l[3]'-° 


Mir 


q = 


am- 1 


, s = 


aw 


A 3 [ir 




A 3 [2f l 




A&r 


s 3 [ir 




B&r 




S 3 [3] U 


Ml* 




A [ 2 r 




_ B ^r 



Mof° 
bM-° 

__ AM 1 ' 0 
p - AAor 

AM* 
m u 
M°r. 

By taking into account the way in which the vectors 
A\' J , A l 3 ,J , B* 3 * J , B[' J are defined, the arrangement detailed 
in Fig. 7 is obtained. It should be noted that in this 
figure the original 8x8 block is subdivided in the four 
4x4 quadrants, within each quadrant the pixels 

belonging to the respective vectors A^ , A^ , B^ , B^ 
have different shadings in the figure. 

According to what has been described above, the 
computation of a 4x4 DCT may be subdivided in two 
stages: consequently, the PROCESS phase that is the only 
phase in which arithmetical operations are performed, is 
done twice : 

a first time, to compute in parallel the sixteen 
1-D DCTs; 

a second time, to compute in parallel four 4x4 DCT 
starting from the coefficients of the 1-D DCTs. 
The variable stage indicates whether the first or 
second calculation stage is being performed. 

During the INPUT phase the variable stage is updated 
to the value 0 . 

At the input in the PROCESS phase, there are 64 input 
MUXes that are controlled by the variable stage. Each MUX 
receives two inputs : 

a pixel of the original picture, coming from the 
INPUT phase (this input is selected when stage = 0) ; 
a coefficient of a 1-D DCT, coming from the ORDER 
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phase (this input is selected when stage = 1) . 
PROCESS phase 

This phase includes processing the 1, m, s 
vectors as shown in Fig. 8. In this figure the following 
symbols are used: 



2 Q x ^8x8 

(#l) 4 o 
» (#l) 4 



per stage = 0 
per stage = 1 



(11) 



1 o -to) 4 J 



/?e/- stage = 0 
per stage = 1 



I)} 



per stage = 0 
per stage = 1 



(13) 



(14) 



1 per stage = 0 

2 per stage = 1 
At the output of the PROCESS structure there are 64 

DEMUXes controlled by the variable stagre. The DEMUX 
address the data according to two conditions : 

if stage = 0, the input datum to each DEMUX is a 
coefficient of a 1-D DCT; therefore the datum must be 
further processed and, for this purpose, is conveyed 
to the ORDER phase; 

if stage = 1, the input datum to each DEMUX is a 
coefficient of a 2-D DCT; therefore the datum must not 
be processed further and therefore is conveyed to the 
OUTPUT phase . 
ORDER phase 

The ORDER phase includes arranging the output 
sequence of the eight 1-D DCTs in eight 1', m' , s' 
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vectors, thus defined: 



Jo 



l J 2 



Jo . 



Moll 








b[o] 






m 


c[0] 






c[l] 


d[o] 




J\ 






a[4] 








a[5] 


b[4] 






A[5] 


c[4] 






c[5] 


A*l 






J[5l 


\a[2]\ 






r«[3]i 


b[2] 






*[3] 


c[2] 






c[3] 


d[2] 


o' — 


^0,0 




43] 


a[6] 




/?'' 




a[7] 


b[6] 






*[7] 


c[6) 






c[7] 


d[6\ 






/[?! 


\e[0T 








/[o] 






/[I] 


g[o) 






«W 


*] 




/■'•° 




fc[l] 


e[4] 


, q' - 


/. u 




,[5] 


/[4] 






/[5] 


*M 






S[5] 


_*]_ 
















M 






/[3] 








rf»] 


m 








*[3] 


e[6] 




/3 U . 




e[7] 


/[6] 






/[?] 


*] 








.*] 






.A[7]. 



After the ORDER phase the variable stage is updated 
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to the value 1 . The output data from the ORDER phase are 
sent to the PROCESS phase . 
OUTPUT phase 

This phase includes rearranging the data originating 
from the second (stage = 1) execution of the PROCESS 
step: starting from these data, which constitute the 
eight -component vectors: a, h, . . . , h, the output block 

Y M * N is thus 
defined: 



y[54] >{55] 



c{0] 6[0] c[0] 40] e[0] M M tijfl 

43] 43] *] 43] e[3) M M }{3) 

«M /W M A[4] a[4] 44] c{4] 44] 

.47] M M h[i] 47] *l 47] 47]. 



(15) 



The main differences between the hardware for 
calculating the four 4x4 DCTs and the hardware needed 
for the sixteen 2x2 DCTs are the following: 
20 the ordering sequences of the pixels of the block 

of the original picture depend on the chosen DCT size; 

to execute the sixteen 2x2 DCTs the PROCESS step 
must be carried out only once; instead, to execute the 
four 4x4 DCTs the PROCESS step must be repeated two 
25 times; 

the operations executed during the PROCESS phase 
are not always the same for the two cases . 
Computation of an 8X8 DCT 

For N = 8 equation (5) becomes : 

Putting: 
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>0~ 




~fo 


y\ 




A 




' F 64xl = 






Jl. 



5 where : 

yt^Wo yu ■- ytjY 
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K/}J =0 =Ko 


'1,1 


x 2,2 


x 3,3 


x 44 


x 5,5 


x 6,6 


x 7,7 




K/}J =0 =Ko 


*4,1 


x l,2 


x 5,3 


x 2,4 


x 0,5 


x 3,6 


x 6,7 




K/}J =0 = Ko 


*7,1 


x 3,2 


x l,3 


x 6,4 


x 4,5 


x 0,6 


x 5,7 




{ A 7,i} 7 i=0 = { x 3,0 


*5,1 


x l,2 


x l,3 


x 0,4 


x 6,5 


x 2,6 


x 4,7 
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K/t 0 = Ko 


x 2,l 


x 6,2 


x 0,3 


x 7,4 


x l,5 


x 5,6 


x 3,7 






x 0,l 


x 4,2 


*6,3 


x l,4 


x 3,5 


x 7,6 


x 2,7 




K/t 0 = ko 


*3,1 


x 0,2 


*2,3 


x 5,4 


x 7,5 


x 4,6 


x l,7 . 
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x 6,\ 


x 5,2 


X A,3 


x 3,4 


x 2,5 


x l,6 


x 0,7 



it may be demonstrated that: 



Y 64xl = (E & ) 64 F 64xl ( 17 ) 
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The 1-D DCT 
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is expressed by the matrix: 



11111111 

C\s Q 3 6 Cf 6 C 7 6 -Q 7 6 -C? 6 -C? 6 -C/ 6 

C s Q 3 c| c 8 7 c 8 7 cf c| c{ 

C? 6 -c 7 6 -c/ 6 -cf 6 cf 6 C l l6 cl 6 -cf 6 

C4 — C4 — C4 C4 c| — c\ — C4 C4 

Q 5 6 -c/ 6 Q 7 6 Q 3 6 -c 3 6 -cj 6 c} 6 -cf 6 

c 8 -cj cl d c| c| 

.Q 7 6 -Q 5 6 Q 3 6 -c} 6 cf 6 -q 3 6 -cf 6 -q 7 6 



35 Where we put 



Cm fin x 

„ = COS(— 7t) 
n 

From the above equations it is evident that the 
computation of an 8x8 DCT may be subdivided in two 
stages : 

calculating eight 1-D DCTs, each for a certain 
sequence of eight pixels; 

calculating the 2-D DCT, starting from the eight 
1-D DCTs. 

These two stages may be executed through the same 
hardware using it twice. The processing is subdivided 
in different steps, to each of which corresponds an 
architectural block. A whole view of the hardware is 
shown in Fig. 9. 
INPUT phase 

The pixels of the 8x8 input block are ordered to 
constitute the eight -component vectors 1, m, n, o, p, q, 
r , s : 



'AM 




'Am 




'A Y m 


A 3 [0] 




Am 




Am 


A 5 [0] 




Am 




Am 


A 7 [0] 




Am 




Am 


B 7 [0] 


, m = 


5 7 [1] 


, . . . , s = 


B 7 m 


B 5 [0] 




B 5 m 




B 5 m 


B 3 [0] 




B 3 m 




B 3 m 


_*l[0] 








Am, 



By taking into account the way in which the vectors 
A lf A 3 , A 5 , A 7 , B 7 , B s , B 3 , B 2 are defined, we obtain the 
detailed arrangement of Fig. 10. It should be noticed 
that in this figure the pixels belonging to the vectors 
A lf A 3 , A s , A 7/ B 7 , B s , B 3 , B x are countersigned by 
different shadings. 

As shown above, the computation of an 8x8 DCT may be 
subdivided into two stage. The PROCESS step, which is 
the only phase in which mathematical operations are 
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performed, is performed twice: 

the first time, to compute in parallel sixteen 1-D 
DCTs; 

the second time, to compute the 8x8 DCT starting 
from the coefficients of the sixteen 1-D DCTs. 
The variable stage indicates whether the first or 
second calculation step is being performed. During the 
INPUT phase, the variable stage is updated to the value 
0. 

At the input of the PROCESS structure, there are 64 
MUXes controlled by the variable stage. Each MUX receives 
two inputs: 

a pixel of the original picture, originating from 
the INPUT phase {this input is selected when stage = 
0) ; 

a coefficient of a 1-D DCT, originating from the 
ORDER phase (this input is selected when stage = 2) . 
PROCESS phase 

This phase includes processing the 1, m, . . . , s 
vectors as shown in Fig. 11. in this figure, the 
following symbols are used: 



(H 2 \ per stage = 1 

2C 8 x 7 8x8 P er sta S e = 0 

-(#6)8 per stages I 

2C\ x 7 8x8 per stage = 0 

i H ^\ per stage = \ 



(19) 



(21) 



{1 per stage = 0 

2 per stage = 1 

At the output of the PROCESS structure there are 64 
DEMUXes controlled by the variable stage. The DEMUXes 
address the data according to two possibilities : 
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if stage = 0, the input datum to each DEMUX is a 
coefficient of a 1-D DCT; therefore, the datum must be 
further processed and, for this purpose, is sent to 
the ORDER phase; 

if stage = 1, the input datum to each DEMUX is a 
coefficient of a 2-D DCT; therefore, the datum does 
not need any further processing and therefore is sent 
to the OUTPUT phase . 
ORDER phase 

This phase includes arranging the output sequence of 
the eight 1-D DCTs in eight 1', m a' vectors, thus 
defined: 

a[7] 
b[7] 
c[7] 
d[7] 
e[7] 

fm 

h[7] 

Following the ORDER phase the variable stage is 
updated to the value 1. The output data from the ORDER 
phase are sent to the PROCESS phase. 
OUTPUT phase 

This phase includes rearranging the data originating 
from the second execution of the PROCESS step (that is, 
with stage = 1) : starting from these data, which 
constitute the eight -component vectors a, b, h, the 

output block Y N *n defined as follows is constituted: 



l' = fo- 



~a[0] 




'a[l] 


b[0] 




b[l] 


c[0] 




c[l] 


d[0] 






e[0] 






/[0] 




/[I] 


g[0] 




£[1] 


h[0] 




_/*[!] 



4=fl- 



J^[0] Ml] 



J>[54] j[55] 



y[7] 



y[63] 



a[0] b[0] c[0] d[0) e[0] /[0] g[0] h[0] 
a[7] b[7] c[7] d[7] e[7] f[7] g[7] h[7] 



(23) 



The main differences between the hardware that 



calculates an 8x8 DCT and the hardware that calculates 
the four 4x4 DCTs are: 

the sequences into which must be arranged the 

pixels of a block of the original picture depend on 

the chosen size of the DCT; 

the operations executed during the PROCESS step 

are not always the same for the two cases. 
Proced ure for calculating the DCT for blocks of 
scalea ble size (8x8 DCTs. 4x4 DCTs and 2x2 DCTs) 

From the above described procedures, an algorithm for 
calculating a chosen one of 8x8 DCT or four 4x4 DCTs (in 
parallel) or sixteen 2x2 DCTs (in parallel) may be 
derived. The selection is made by the user by assigning 
a certain value to the global variable size: 

0 for an 8x8 DCT 

1 for four DCTs 

2 for sixteen 2x2 DCTs 

The procedure is subdivided in various phases 
(regardless of the value of the variable size) , to each 
of which corresponds an architectural block. A whole 
view is shown in Fig. 12. Each phase has been organized 
in order to provide for partial results corresponding to 
the chosen value, minimizing redundancies. Sometimes the 
operations performed are different depending on the 
value of size. In these cases, the architecture 
considers a MUX whose control input is size. Let us 
examine now the various phases and highlight the 
differences in respect to the architectures that have 
already been described above: 
INPUT phase 

The object of this phase, depicted in Fig. 13, is to 
arrange the data to allow the computation starting from 
the arranged data of the 1-D DCTs. This is done by 
inputting the luminance values of the pixels (8x8 
matrix) and arranging them in eight -component vectors 2, 
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For example : 



![!] = 



l[0]=x 0>0 

, 2 per size = 0 or 1 
■ 2 0 per size = 2 



)x 0 j per size = 0 
x 4 7 per size = 1 
x 7 >7 per size = 2 
PROCESS phase with stage = 0 

This phase includes calculating in parallel the eight 
1-D DCTs by processing the vectors 1, m, s as shown 

in Fig. 14. In this figure may be observed the use of 
16 MUXes controlled by the variable size. The eight 
MUXes on the left serve to bypass the operations required 
for the computation of the 8x8 DCT. Thus, the bypass 
occurs when size = 1 or 2, while it does not occur for 
size = 0. The eight MUXes on the right serve to output 
only the result that corresponds to the pre-selected 
value of size. 



per stage = 0 
per stage = 1 



(24) 



mi 0 1 
1 o ta)J 



0 



2C 8 x 7 8x8 

-^8x8 

-(#3)4 0 

0 -(H 3 \ 



per stage, size = (0,0) or (0,1) 

per stage, size = (0,2) or (1,2) 
per stage, size = (1,0) 

per stage, size = (1,1) 

per stage, size = (0,0) or (0,1) 

per stage, size = (0,2) or (1,2) 
per stage, size = (1,0) 

per stage, size = (1,1) 
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r ,,. 

..{ 



per stage 
per stage 



per stage, size = (0,0) or (0,1) 

per stage, size = (0,2) or (1,2) 
per stage, size - (1,0) 

per stage, size = (1,1) 

= 0 
= 1 
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2C 8 x/ 8xg 
2C 16 x 7 gx8 

-(",). 

2C,' s x/ M 

-(",), 



per stage = 0 
per stage = 1 

per stage = 0 
per stage = 1 

per stage = 0 
per stage = 1 
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The scheme in Fig. 14 may be subdivided into the 
architectural blocks shown in Fig. 15. For example, 
two vectors each of eight components (each component 
being a pixel, that may have been processed already) are 
input to the QA block, which outputs two vectors of 
eight components: the first vector is the sum of the two 
input vectors, while the second vector is the difference 
between the two input vectors that is successively 
processed with the linear operator A. It should be 
noted that the operators A, B, C, D, E, F, G are 8x8 
matrices . 

By considering a lower level of generalization, the 
QA, QB, QC blocks are shown in detail in Figures 16, 17 
and 18, respectively. In these figures the MUXes are 
controlled by three bits, which correspond to the 
variable stage (which may take the value 0 or 1, and 
thus is represented by a bit) and the variable size 
(which may take the value 0, 1 or 2, and thus is 
represented by two bits) . The blocks QD, QE, QF, QG are 
shown in detail in Figures 19, 20, 21 and 22, 
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respectively. In these figures the MUXes are controlled 
by a bit that corresponds to the variable stage. 
ORDER phase 

The ORDER phase, depicted in Fig. 23, includes 
arranging the output sequences of the eight 1-D DCTs in 



eight vectors 2 ' 



. . , a ' . For example : 
/'[()] = a[0]; 



/'[4] = 



e[0] per size = 0 
a[4] per size = 1 
per size = 2 



s'[7] = h[7]; 

OUTPUT Phase 

This phase, depicted in Fig. 24, includes 
rearranging the data coming from the second (that is 
with stage = 1) execution of the PROCESS step. Starting 
from these data, constituting the eight -component 
, h, the output block yN*N is 



vectors a, b, 
constituted. 
For example 



y[0] = 



a[0] 
b[0] 

/[i] 



per size = 0 or 
per size = 2 

per size = Ool 
per size = 



y[63] = 



h[7] 
s[7] 



per size = 0 
per size = 1 
per size = 2 



Description of the Drawings 

A functional block diagram of a picture 
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Description of the Drawings 

A functional block diagram of a picture 
compressor- coder according to the present invention 
may be represented as shown in Figure 1 . 

Essentially, the compressor- coder performs a 
hybrid compression based on a fractal coding in the 
DCT domain. This is made possible by the peculiar 
architecture of parallel calculation of the DCT on 
blocks of scaleable size of pixels, as described 
above . 

Hereinbelow, the remaining figures are described 
one by one : 

Figure 2 is a flow graph of the 2x2 DCT generating 
block. 

This block is the "base" block that is repeatedly 
used in the PROCESS phase of all the NxN DCTs, where N 
is a power of 2 . 
In particular: 

the flow graph for a 2x2 DCT is shown in Fig. 
2, wherein A = B = C = 1 and the input and output 
data are pixels in the positions (0,0) , (0,1), 
(1,0), (1,1); 

for sixteen 2x2 DCTs, the inputs and the 
outputs are eight -component vectors and the 
following symbols are used, considering A = B = C 

~ -^-8x8' 

for four 4x4 DCTs the inputs and outputs are 
eight -component vectors and the following symbols 
are used: 



U o 1 

0 (H,)J 



per stage = 0 
per stage = 1 



(32) 
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2C g x/ gj(g 
(»>)< 0 



per stage = 0 
per stage = 1 

per stage = 0 
^er stage = 1 



(33; 



2Cjx/ 8xg 

for an 8x8 DCT, the inputs and outputs are 
eight -component vectors and the flowing symbols 
are used: 



f 2C',xI M 
I 2C, J x/ M 

f 2c;x/ !xt 



per stage = 0 
per stage = 1 

per stage = 0 
per stage = 1 



(35) 



(36) 



(37) 



per stage = 0 
per stage = 1 

In the scaleable architecture for calculating 
an 8x8 DCT or four 4x4 DCTs (in parallel) or 
sixteen 2x2 DCTs (in parallel) , the inputs and the 
outputs are vectors of eight components and the 
following symbols are used: 



("A 



r(H,x o -i 



per (stage, size) = (0,0) or (0,1) 
per (stage, size) = (0,2) or (1,2) 
per (stage, size) (1,0) 
per (stage, size) (1,1) 



(38) 
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per {stage, size) = (0,0) or (0,1) 
per {stage, size) = (0,2) c?r (1,2) 

(39 

per (stage, size) (1,0) 
per {stage, size) (1,1) 

per {stage, size) = (0,0) or (0,1) 
per {stage, size) = (0,2) or (1,2) 

(4 0 

per (stage, size) (1,0) 
_per {stage, size) (1,1) 

Figrure 3 illustrates the architecture for calculating 
sixteen 2x2 DCTs in parallel. 

•The pixels that constitute the input block are 
ordered during the INPUT phase and processed during the 
PROCESS phase to obtain the coefficients of the sixteen 
2-D DCTs on four samples. For example, the 2-D DCT of 
the block (0,1) constituted by 

{1 [0] ,m[0] ,n[0] ,o[0] } is {a [OJ , b [0] , c [0] , d [0] } . 

The coefficients of the 2-D DCTs are rearranged 
during the ORDER phase in eight vectors of eight 
components. For example the coefficients 
:{a[0] ,b[0] ,c[0] ,d[0] } will constitute the vector 1'. 

The sixteen two- component vectors so obtained are 
sent to the PROCESS phase to obtain the coefficients of 
the 2x2 DCT. These coefficients, reordered during the 
OUTPUT phase, constitute the output block. 

Figure 4 shows the ordering of the input data for 
calculating sixteen 2x2 DCTs. 

This figure shows the way the pixels of the 8x8 input 



2C 8 3 x/ 8x 



L 0 -to)J 



2C]x/ 8xl 



\(H 2 \ 0 1 

L 0 to)J 



block are ordered to constitute the vectors of 8 
components I, m, a. In each quadrant with 0 

^ if 3 ^3, the pixels belonging to the vectors are 
symbolized by different shadings. For example: 
{^/°jt° }Lo = {*0,0>*l,l} 

From each of these vectors, the components with the 
same index (that is the pixels with the same column 
index) will form a vector of four components. For 
example the vector 2 is constituted by the elements 
{A1[0], B1[0J}. 

Therefore, each pixel of the 8x8 input block will 
constitute a component of one of the vectors 1, m, n, o, 
p, q, r, s. 

Figure 5 shows the process phase for calculating 
sixteen 2x2 DCTs. 

This phase includes processing the eight -component 
vectors 1, m, a. The PROCESS phase, which is the 

only phase in which arithmetical operations are 
performed, is executed only once to calculate in 
parallel the sixteen 2-D DCTs. 

Figure 6 illustrates the architecture for calculating 
four 4x4 DCTs. 

The pixels that constitute the input block are 
ordered in the INPUT phase and processed in the PROCESS 
phases to obtain the coefficients of the sixteen 1-D 
DCTs on 4 samples. For example, the 1-D DCT of the 
sequence {l [0] ,m [0] ,n [0] , or [0] } is 
{a[0] ,b[0] ,c[0] ,d[0] } . 

The coefficients of the 1-D DCTs are reordered in the 
ORDER phase in 8 vectors of eight components. For 
example the coefficients {a [0] ,b [0] , c [0] , d [0] } will 
constitute the vector I ' . 

The 4 four -component vectors so obtained are sent to 
the PROCESS phase to obtain the coefficients of the 4x4 
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DCT. These coefficients, reordered in the OUTPUT phase, 
constitute the output block. 

Figure 7 shows the arrangement of the input data for 
calculating four 4x4 DCTs. 



block are ordered to constitute the eight -component 
vectors 2, m, a. 

In each quadrant with 0 < i,j < 3, the pixels 

belonging to the different vectors have different 
10 shadings. For example: 



From each of these vectors, the components with the 
same index (that is, the pixels with the same column 
index) will form a vector of four components. For 
15 example the vector 2 is constituted by the elements 
{Al [0] , A3 [0] , B3 [0] , Bl [0] } . 

The outcome is that each pixel of the input 8x8 block 
will constitute one component of one of the vectors 2, 
m, n, o, p, q, r, a. 
2 0 Figure 8 depicts the PROCESS phase for calculating 

the four 4x4 DCTs. 

This phase includes processing the eight -component 
vectors: 2, m, s. 

The PROCESS phase, which is the only phase wherein 
25 arithmetical operations are performed, is carried out 



5 



This figure shows how the pixels of the 8x8 input 




twice: 
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the first time (stage = 0) , to calculate in 
parallel the sixteen 1-D DCTs; 

the second time (stage = I) , to calculate the 8x8 
DCT starting from the coefficients of the 1-D DCTs. 



2Cjx/ 8 



per stage = 0 



A = 




(41) 



per stage = 1 
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C = 



2C l x ^ M per stage = 0 

[ o -(// 3 )J = 1 

2C 4 x ^8xg per jfage '= 0 



1 o ouJ 



-{ 



per stage = 1 

1 per stage = 0 

_ , (44) 

2 per stage = 1 

Figure 9 illustrates the architecture for calculating 
an 8x8 DCT. 

The pixels that constitute the input block are 
ordered during the INPUT phase and are processed in the 
PROCESS phase to obtain the coefficients of the eight 1- 
D DCTs on 8 samples. For example, the 1-D DCT of the 

sequence {l[0],m[0], ... , a [0] } is [a[0],b[0], 

f h[0]}. 

The coefficients of the 1-D DCTs are rearranged 
during the ORDER phase in 8 vectors of eight components. 

For example the coefficients {a[0] ,b[0] , ,h[7]} 

will constitute the 2' vector. 

The 8 eight -component vectors so obtained are sent to 
the PROCESS phase to obtain the 8x8 DCT coefficients. 
These coefficients, rearranged during the OUTPUT phase, 
constitute the output block. 

Figure 10 shows the arrangement of the input data for 
calculating an 8x8 DCT. 

This figure shows how the pixels of the input 8x8 
block are arranged to constitute the 8 eight -component 
vectors: 2, m, a. The pixels belonging to the 

vectors Al, A3, A5, A7 , B7 , B5 , B3 , Bl are symbolized 
with different shadings, for example: 
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\\l J/-0 = V*°.<> ' ' X 2.2 ' X h* » *4,4 ' X 5,5 » X 6,6 > X 7,7 ) 

From each of these vectors, the components with the 
same index (that is, the pixels with the same column 
index) will form a vector of eight components. For 
example, the vector I is constituted by the elements 
{Alio], A3 [0] , . . . , Bl [0] } . 

The result is that each pixel of the input 8x8 block 
will constitute a component of one of the vectors 1, m, 
a, o, p, q, r, s. 

Figure 11 depicts the PROCESS phase for calculating 
an 8x8 DCT. 

This phase includes processing the eight -component 
vectors 1, m, a. 

The PROCESS phase in which arithmetical operations are 
performed is executed twice : 

the first time {stage=0) , to calculate in parallel 
the sixteen 1-D DCTs; 

the second time (sfcagre=l) , to calculate the 8x8 
DCT starting from the coefficients of the 1-D DCTs. 
In Fig. 11 the following symbols have been used: 

, I 2Ci x /. . per stage = 0 , 

A = \ f„\ i' (45) 

[ \ H 2h per stages I 



[ 2C 8 3 x/ gx8 

1 -0U 



per stage = 0 
per stage = 1 



c \ 2C]x/ M per stage = 0 f ^ 



(H A ) g per stage = 1 ' 

■l ; 



- i«-*a«.-0 f . (4a) 

per stage = 1 

Figure 12 illustrates a scaleable architecture for 
calculating an 8x8 DCT or four 4x4 DCTs or sixteen 2x2 
DCTs. 

The pixels that constitute the input bloGk are ordered 
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during the INPUT phase and processed during the PROCESS 
phase, which calculates: 

the 1-D DCTs (for stage = 0, that is for the 8x8 
DCT, and for stage = 1, that is for the 4x4 DCTs; 
5 the 2-D DCTs for stage = 2 directly, that is for 

the 2x2 DCTs; 

When stage = 0 and stage = 1 the coefficients are 
then rearranged in the ORDER phase in 8 eight -component 
vectors, which are sent to the PROCESS phase to obtain 
10 the coefficients of the 2-D DCT. These coefficients, 
rearranged in the OUTPUT phase, constitute the output 
block. 

If stage = 2 the coefficients are transmitted 
directly to the OUTPUT phase, where they are rearranged 
15 to constitute the output block. 

Figure 13 depicts the INPUT phase for a scaleable 
architecture . 

The inputs are the 64 pixels that constitute the 
input block. 

20 The arrangement of the inputs is operated through the 

MUXes controlled by the size variable. 

The 64 outputs are the 8 vectors of eight components 
I , m, , a . 

Figure 14 depicts the PROCESS phase for a scaleable 

2 5 architecture. 

This phase includes calculating in parallel the eight 
1-D DCTs by processing the vectors 1, n, „, s as shown 
in Fig. 11. 

In this figure we may notice that the use of 16 MUXes 

3 0 controlled by size. 

The eight MUXes on the left serve to bypass the 
necessary operations only for calculating the 8x8 DCT; 
therefore, the bypass takes place for stage = 1 or 2, 
while it does not occur when stage = 0. 
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The eight MUXes on the right serve to output only the 
result corresponding to the pre-selected size. 
In Fig. 14 the following symbols are used: 

1 per stage = 0 

2 per stage = 1 ' 



•-{ 



10 



15 



20 



25 



0 

L 0 tax 

2C 8 3 x/ 8x8 
-("„). 



L ° -ouj 



2Cjx/ SKt 



\{H 2 \ 0 ] 

L 0 te)J 

J 2Cix/ M 
1 (",), 



per {stage, size) = (0,0) or (0,1) 
per {stage, size) = (0,2) or (1,2) 
per {stage, size) (1,0) 
per {stage, size) (1,1) 

per {stage, size) = (0,0) or (0,1) 
per {stage,size) = (0,2) or (1,2) 
per {stage, size) (1,0) 
per {stage, size) (1,1) 

per {stage, size) = (0,0) or (0,1) 
per {stage, size) = (0,2) or (1,2) 
per {stage, size) (1,0) 
per {stage, size) (1,1) 



per stage = 0 
per stage = 1 ' 



C49) 



fso; 



C5i; 



per stage = 0 
per stage = 1 ' 



(53) 



(54) 
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2C 1 7 6 x/ &< 



[ 8x8 



per stage = 0 
per stage = 1 ' 



ess; 



G = 




8x8 



per stage = 0 
per stage = 1 



(56) 



Ficrure 15 is a block diagram of the structure that 
implements the PROCESS phase. 
5 For example, the QA block receives as an input two 

vectors of eight components (each component is a pixel, 
that may have already been processed) and outputs two 
vectors of eight components. The first vector is the sum 
of the two input vectors, while the second vector is the 
10 difference between the two input vectors, successively 
processed with the linear operator A. It should be 
noticed that the A, B, C, D, E, F, G operators are 8x8 
matrices . 

Ficrure 16 is a detailed scheme of the QA block. 
15 This scheme shows the details of the single 

components of the two input vectors and the arithmetical 
operators (adders etc.) which act on each component. The 
results are sent to the MUXes depicted on the right side 
of the figure, each of which, depending on the control 
20 variables stage and size, select only one result, which 
constitutes one component of the output vector. 

Figure 17 is a detailed scheme of the QB block. 
This scheme shows the details of the single 
components of the two input vectors and the arithmetical 
25 operators (adders etc.) which act on each component. The 
results are sent to the MUXes depicted on the right side 
of the figure, each of which, depending on the control 
variables stage and size, select only one result, which 
constitute a component of the output vector. 
3 0 Figure 18 is a detailed scheme of the QC block. 

This scheme shows the details of the single 
components of the two input vectors and the arithmetical 
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operators (adders etc.) acting on each component. The 
results are sent to the MUXes depicted on the right side 
of the figure, each of which, depending on the control 
variable stage and size, select only one result, which 
5 constitute one component of the output vector. 

Figure 19 is a detailed scheme of the QD block. 
This scheme shows the details of the single 
components of the two input vectors and the arithmetical 
operators (adders etc.) which act on each component. The 

10 results are sent to the MUXes depicted on the right side 
of the figure, each of which, depending on the control 
variable stage and size, select only one result, which 
constitute a component of the output vector. 

Ficrure 20 is a detailed scheme of the QE block. 

15 This scheme shows the details of the single 

components of the two input vectors and the arithmetical 
operators (adders etc.) which act on each component. The 
results are sent to the MUXes depicted on the right side 
of the figure, each of which, depending on the control 

20 variable stage, select only one result, which constitute 
one component of the output vector. 

Figure 21 is a detailed scheme of the QF block. 
This scheme shows the details of the single 
components of the two input vectors and the arithmetical 

25 operators (adders etc.) which act on each component. The 
results are sent to the MUXes depicted on the right side 
of the figure, each of which, depending on the control 
variable {\em stage} select only one result, which 
constitute a component of the output vector. 

3 0 Figure 22 is a detailed scheme of the QG block. 

This scheme shows the details of the single 
components of the two input vectors and the arithmetical 
operators (adders etc . ) which act on each component . The 
results are sent to the MUXes depicted on the right side 

3 5 of the figure, each of which, depending on the control 



42 



variable stage, select only one result, which constitute 
a component of the output vector. 

Figure 23 depicts the ORDER phase for the scaleable 
architecture . 

5 The inputs are constituted by the 64 pixels after 

they have been processed through the PROCESS phase. 

The inputs arrangement is effected by the MUXes 
controlled by the variable size. 

The 64 outputs are the components of the eight - 
10 component vectors 2, m, s. 

Figure 24 depicts the OUTPUT phase for the scaleable 
architecture . 

The inputs are constituted by the 64 2-D DCT 
coefficients. The input arrangement is effected by the 
15 MUXes controlled by the variable size. 

The 64 outputs are the pixels that constitute the 
output block. 
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THAT WHICH IS CLAIMED IS 

l. A method of calculating the discrete cosine 
transform (DCT) of blocks of pixels of a picture, 
characterized in that it comprises the steps of 
5 defining first subdivision blocks called range blocks, 

having a fractional and scaleable size N/2 i *N/2 i , where 
i is an integer number, in respect to a maximum pre- 
defined size of N*N pixels of blocks of division of 
said picture, referred to as domain blocks, shif table 
10 by intervals of N/2 1 pixels, and of calculating the 

DCT on 2 i range blocks of subdivision of a domain 
block of N*N pixels of said picture, in parallel. 

2. The method according to claim 1, characterized in 
15 that the calculation of the DCT in parallel on all range 
blocks of subdivision of a certain domain block is 
carried out in a hardware structure and comprises the 
steps of : 

a) ordering the pixels in function of a subdivision in 

20 range blocks of a certain dimension by rearranging the 

input pixels in a number 2 1 of sequences or vectors of 
2 1 components; 

b) calculating in parallel 2* monodimensional DCTs by 
processing said vectors defined in the preceding step 

25 a) ; 

c) arranging the output sequences of the monodimensional 
DCTs relative to said 2 1 vectors; 

d) completing the calculation in parallel of 2 1 
bidimensional DCTs by processing said output sequences 

30 of monodimensional DCTs produced in step c) ; 

e) arranging the output sequences of bidimensional DCTs 
generated in step d) in a number 2 1 of vectors of 
bidimensional DCT coefficients. 
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3. The method according to claim 2, characterized in 
that the calculation in parallel of said 2 i 
monodimensional DCTs in step b) and the completion of 
5 the parallel calculation of 2 1 bidimensional DCTs of step 
d) are performed by subdividing the sequences resulting 
from step a) and from step c) , respectively, in groups 
of scalar elements, calculating the sums and differences 
thereof by way of adders and subtracters and- by 
10 reiterately multiplying the sum and difference results 
by respective coefficients until completing the 
calculation of the relative DCT coefficients, 
respectively monodimensional and bidimensional. 

15 4 . A method of compressing data of a picture to be 

stored or transmitted through a fractal coding, 
characterized in that the fractal transform is carried 
out in the domain of the discrete cosine transform (DCT) 
through the following steps: 
20 subdividing a picture in blocks of pixels of said 

two distinct type of blocks as defined in claim 1; 

parallely calculating the discrete cosine 
transform (DCT) of all the 2 1 range blocks and of a 
relative domain block; 
25 classifying the transformed range blocks according 

to their relative complexity represented by the sum of 
the values of the three AC coefficients; 

applying the fractal transform in the DCT domain 
to the data of the range blocks whose complexity 
3 0 classification exceeds a pre-defined threshold and 

storing only the DC coefficient of the range blocks 
with a complexity lower than said threshold, 
identifying a relative domain block to which the range 
block in a transformation belongs that produces the 
35 beset fractal approximation of the range block; 
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calculating a difference picture between each 
range block and its fractal approximation; 

quantizing said difference picture in the DCT 
domain by using a quantization table preestablished in 
5 function of the characteristics of human sight; 

coding said difference picture quantized by a 
process based on the probabilities of the quantization 
coefficients; 

storing or transmitting the coding code of each 
10 range block compressed in the DCT domain and the DC 

coefficient of each uncompressed range block. 
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METHOD AND SCALABLE ARCHITECTURE FOR PARALLEL 
CALCULATION OF THE DCT OF BLOCKS OF PIXELS OF DIFFERENT 
SIZES AND COMPRESSION THROUGH FRACTAL CODING 

Abstract of the Disclosure 

A method of calculating the discrete cosine transform 
(DCT) of blocks of pixels of a picture includes the 
steps of defining first subdivision blocks called range 

5 blocks, having a fractional and scaleable size N/2 1 *N/2 i , 
where i is an integer number, with respect to a maximum 
pre-defined size of N*N pixels of blocks of division of 
the picture, referred to as domain blocks, shif table by 

0 intervals of N/2 1 pixels. The method also includes the 
step of calculating the DCT on 2 i range blocks of a 
subdivision of a domain block of N*N pixels of the 
picture, in parallel. 
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FIG. 6 
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